fix(sqlite-provider): use caller-provided key column name by anoop-narang · Pull Request #9 · hotdata-dev/datafusion-vector-search-ext

anoop-narang · 2026-03-18T10:47:11Z

Summary

SqliteLookupProvider previously hardcoded row_idx as the key column name in CREATE TABLE and WHERE clauses
This caused no such column: row_idx errors when the caller's schema used a different key column name (e.g. _key)
Now derives the key column name from the first field in the provided schema, matching the existing convention that the first schema field is always the primary key

Test plan

All existing tests pass (they use row_idx as key col — still works since name comes from schema)
cargo fmt --check clean
cargo clippy --all-features -- -D warnings clean

claude · 2026-03-18T10:49:21Z

src/udf.rs

    a.iter()
        .zip(b.iter())
        .map(|(x, y)| (x - y) * (x - y))
        .sum::<f32>()


P1 — breaking API change bundled into a bug-fix PR.

Removing .sqrt() changes l2_distance from returning actual Euclidean distance to returning squared L2. While sort order is preserved (ranking is still correct), any user with a threshold-based filter silently gets wrong results:

-- before: this excluded rows farther than 1.5 actual units apart -- after: this excludes rows farther than sqrt(1.5) ≈ 1.22 actual units apart WHERE l2_distance(vector, query) < 1.5

The consistency argument (match USearch's internal L2sq) is valid, but the right fix is to normalize the index path (apply sqrt to USearch's returned distances) — not to change the UDF's contract. The function name l2_distance strongly implies actual Euclidean distance.

Options:

Revert this change and apply sqrt to USearch distances on the index path instead.

If L2sq semantics are intentional, rename to l2sq_distance (or l2_distance_squared) so callers aren't silently broken.

This change should also be a separate PR from the SQLite key-column fix.

This PR only contains the sqlite-provider key column fix. The l2_distance sqrt removal was in PR #8 (already merged and squash-merged to main). After rebasing, the diff for this PR contains only the sqlite_provider.rs changes. The bot appears to have reviewed a stale diff before the rebase.

claude · 2026-03-18T10:49:24Z

tests/execution.rs

    let sql = format!(
        "SELECT id FROM items WHERE label = 'nonexistent' ORDER BY l2_distance(vector, {Q}) ASC LIMIT 2"
    );
    let ids = collect_ids(&ctx, &sql).await;


P2 — the actual bug (non-row_idx key column in SqliteLookupProvider) has no regression test.

The PR test plan notes that existing tests still use row_idx, which is coincidentally the default. A test that builds a SqliteLookupProvider with a schema whose first field is named something other than row_idx (e.g. _key) and then calls fetch_by_keys would directly exercise the fixed code path and guard against future regressions.

Added in 6934d3a — test_custom_key_column_name exercises the provider with _key as the key column, testing both fetch_by_keys and projection.

claude

Review

Issues

P1 — src/udf.rs line 31: breaking semantic change to l2_distance

Removing .sqrt() changes the UDF's return value from actual Euclidean distance to squared L2. Sort order is preserved so ANN ranking is unaffected, but threshold-based filters silently produce wrong results for any threshold ≠ 1.0. This is a public API contract violation bundled into a SQLite bug-fix PR.

P2 — tests/execution.rs: no regression test for the stated bug fix

All SQLite-provider tests use row_idx as the key column, which happened to work before the fix. There is no test that exercises SqliteLookupProvider with a non-row_idx key column (e.g. _key), which is the exact scenario the PR claims to fix.

Action Required

Either revert the l2_distance sqrt removal and normalize the index path instead, or rename the UDF to l2sq_distance to avoid silent breakage for callers with distance thresholds. Either way, split it into a separate PR.
Add a SqliteLookupProvider test that uses a key column name other than row_idx.

The SqliteLookupProvider previously hardcoded "row_idx" as the key column name in CREATE TABLE and WHERE clauses. This caused errors when callers used a different key column name (e.g. "_key"). Now derives the key column name from the first field in the provided schema, making the provider work with any key column name.

claude · 2026-03-18T10:55:16Z

src/sqlite_provider.rs

        parquet_col_indices: &[usize],
    ) -> DFResult<Self> {
+        // The first field in the schema is the key column (INTEGER PRIMARY KEY).
+        let key_col = schema.field(0).name().clone();


Suggestion: all existing tests happen to use row_idx as the key column name, so the only coverage here is "dynamic name produces the same SQL as the old hardcoded string." Consider adding a test in sqlite_provider_test.rs that builds a provider with a schema whose first field is named something other than row_idx (e.g. _key) and confirms that fetch_by_keys returns the correct rows. That would give you a direct regression guard for the actual bug scenario.

Done — added test_custom_key_column_name that uses _key as the key column name.

Exercises SqliteLookupProvider with "_key" as the key column (the scenario used by runtimedb), verifying both fetch_by_keys and projection work with non-default key column names.

claude · 2026-03-18T11:03:07Z

src/sqlite_provider.rs

        let keys_vec = keys.to_vec();
        let pool = self.pool.clone();
        let table_name = self.table_name.clone();
+        let key_col = self.key_col.clone();


Non-blocking suggestion: fetch_by_keys takes a _key_col: &str parameter (from the PointLookupProvider trait) that is silently ignored here in favour of self.key_col. This was already the pre-existing behaviour in HashKeyProvider, so this PR is consistent, but it means a caller who passes the wrong column name gets no feedback.\n\nConsider a debug_assert_eq!(_key_col, self.key_col) guard, or removing the parameter from the trait entirely in a follow-up.

claude bot reviewed Mar 18, 2026

View reviewed changes

claude bot requested changes Mar 18, 2026

View reviewed changes

anoop-narang force-pushed the fix/sqlite-provider-dynamic-key-col branch from f70c668 to e49e0d1 Compare March 18, 2026 10:53

claude bot reviewed Mar 18, 2026

View reviewed changes

claude bot previously approved these changes Mar 18, 2026

View reviewed changes

test(sqlite-provider): add test for custom key column name

6934d3a

Exercises SqliteLookupProvider with "_key" as the key column (the scenario used by runtimedb), verifying both fetch_by_keys and projection work with non-default key column names.

anoop-narang dismissed claude[bot]’s stale review via 6934d3a March 18, 2026 11:01

claude bot reviewed Mar 18, 2026

View reviewed changes

claude bot approved these changes Mar 18, 2026

View reviewed changes

anoop-narang merged commit 42abb40 into main Mar 18, 2026
5 checks passed

anoop-narang deleted the fix/sqlite-provider-dynamic-key-col branch March 18, 2026 11:04

anoop-narang mentioned this pull request Mar 18, 2026

feat(tracing): granular spans + auto-index migration for SQLite #10

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sqlite-provider): use caller-provided key column name#9

fix(sqlite-provider): use caller-provided key column name#9
anoop-narang merged 2 commits intomainfrom
fix/sqlite-provider-dynamic-key-col

anoop-narang commented Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

anoop-narang Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

anoop-narang Mar 18, 2026

Uh oh!

claude bot left a comment

Uh oh!

claude bot Mar 18, 2026

Uh oh!

anoop-narang Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anoop-narang commented Mar 18, 2026

Summary

Test plan

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

anoop-narang Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

anoop-narang Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Review

Issues

Action Required

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

anoop-narang Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant